Welcome to our advanced R project working on a marketing campaign data set focussing on term deposit offers! You can find this project also under the following GitHup Repository. This project focuses on the classification problem that a Portugese bank was facing, which can be found under following link. The bank wanted to know which customers are more likely to accept a term deposit offer in a telemarketing campaign. Thus a predicitive model to answer this question is created in order to optimize the bank’s marketing resources. The general idea behind this is that it is preferrable to call customers that are more likely to accept the offer because each call means marketing cost and with such a predicitive model not only cost can be controlled better but also the actual return on the marketing campaign can be improved.
Our worked is divided into two parts: Data exploration and modelling (including feature engineering). Thus two markdowns are provided to give a better overview.
But before we go into the project itself, lets introduce our team tackled this problem by dividing the machine learning process into three different workstreams:
O1 R Masters :)
Before we start with the first part of our project, which is data exploration, we load the required R libraries from a separate R script called “libraries”.
source('notebooks/libraries.R')
We were provided with two data sets. The first is a train data set, which includes the actual outcome of the marketing activity for each contacted customer (labeled train data set). The second is a “blind” test data set that does not include the actual outcome of the campaign. At the end of our project we will use our best model to make predictions on this test data set. One can imagine this like putting the model into production a feeding it with new data in order to predict which customers will accept the offer and which will not. The overall data set contains 45,211 rows and the original data set can be found in the UCI Machine Learning Repository. It is important to note that the data sets we were provided with are not 100% the same compared to the repository. Some of the features from the repository are not included here while other features such as balance, educatio, day are added or transformed. However we know based on the information on the data set provided that the data covers activities from from May 2008 to November 2010. Comparing the size of the train and the blind test set we see that the over all data set was split into 80% train and 20% test.
The provided data sets contain general as well as banking specific information about each contacted client. Such data can usually be found in the CRM system of a bank. The bank intends to use this data in order to identify patterns that a related to the likeliness of a customer responding positive to the offered marketing campaign. It is important to notice that this campaign using phone calls is about a term deposit offer. A term deposit is a specific banking product that is a fixed-term investment that includes the deposit of money into an account at a financial institution. Term deposit investments usually carry short-term maturities ranging from one month to a few years and will have varying levels of required minimum deposits. Thus the developed predicted model should only be used for marketing campaigns around term deposits since the same customer can respond completly different when he is offered a different banking product such as a over draft protection or a loan.
Now that we know what the data set is about in general let us take a closer look at it and see what it actually contains in terms of features.
Both data sets include the same 16 features while the train set contains an additional column storing the outcome of the marketing campaign, which represents our target variable. Since the target variable is either yes (term deposit offer accepted) ot no (term deposit offer not accepted), we are facing a binary classification problem.
The next step is to take a closer look at the provided features.
raw_train_data<-fread('Data/BankCamp_train.csv', stringsAsFactors = F)
raw_test_data<-fread('Data/BankCamp_test.csv', stringsAsFactors = F)
str(raw_train_data)
## Classes 'data.table' and 'data.frame': 36168 obs. of 17 variables:
## $ age : int 50 47 56 36 41 32 26 60 39 55 ...
## $ job : chr "entrepreneur" "technician" "housemaid" "blue-collar" ...
## $ marital : chr "married" "married" "married" "married" ...
## $ education: chr "primary" "secondary" "primary" "primary" ...
## $ default : chr "yes" "no" "no" "no" ...
## $ balance : int 537 -938 605 4608 362 0 782 193 2140 873 ...
## $ housing : chr "yes" "yes" "no" "yes" ...
## $ loan : chr "no" "no" "no" "no" ...
## $ contact : chr "unknown" "unknown" "cellular" "cellular" ...
## $ day : int 20 28 19 14 12 4 29 12 16 3 ...
## $ month : chr "jun" "may" "aug" "may" ...
## $ duration : int 11 176 207 284 217 233 297 89 539 131 ...
## $ campaign : int 15 2 6 7 3 3 1 2 1 1 ...
## $ pdays : int -1 -1 -1 -1 -1 276 -1 -1 -1 -1 ...
## $ previous : int 0 0 0 0 0 2 0 0 0 0 ...
## $ poutcome : chr "unknown" "unknown" "unknown" "unknown" ...
## $ y : chr "no" "no" "no" "no" ...
## - attr(*, ".internal.selfref")=<externalptr>
str(raw_test_data)
## Classes 'data.table' and 'data.frame': 9043 obs. of 16 variables:
## $ age : int 58 43 51 56 32 54 58 54 32 38 ...
## $ job : chr "management" "technician" "retired" "management" ...
## $ marital : chr "married" "single" "married" "married" ...
## $ education: chr "tertiary" "secondary" "primary" "tertiary" ...
## $ default : chr "no" "no" "no" "no" ...
## $ balance : int 2143 593 229 779 23 529 -364 1291 0 424 ...
## $ housing : chr "yes" "yes" "yes" "yes" ...
## $ loan : chr "no" "no" "no" "no" ...
## $ contact : chr "unknown" "unknown" "unknown" "unknown" ...
## $ day : int 5 5 5 5 5 5 5 5 5 5 ...
## $ month : chr "may" "may" "may" "may" ...
## $ duration : int 261 55 353 164 160 1492 355 266 179 104 ...
## $ campaign : int 1 1 1 1 1 1 1 1 1 1 ...
## $ pdays : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ previous : int 0 0 0 0 0 0 0 0 0 0 ...
## $ poutcome : chr "unknown" "unknown" "unknown" "unknown" ...
## - attr(*, ".internal.selfref")=<externalptr>
For our following data exploration we are creating two different vectors, one that stores all the features that are categorical and one with the numeric/continuous features.
discrete_var<-c("job", "marital", "education", "default", "housing", "loan","contact", "month", "poutcome")
continuous_var<-c("age", "balance","day", "duration", "campaign", "pdays", "previous")
First of all we take a look how our target is distributed. This means looking at how many of the contacted customers accepted the offer compared to how many did not. We see here that most of the customers did not accept the offer (88.4%). Only 11.6% of the customers that were contacted actually accepted the offer. If we assume that the customers were contacted randomly without any preselection, we can say that only one out of nine customers accept the offer.
This insight is very important for our later machine learning model. Because if we would predict all of the customers are a ‘no’ we would end up with an accuracy of almost 90%. Also if ‘no’ is defined as the positive class and we predict all of the rows as ‘no’ we end up with very high sensitivity results (True positives/[True positives+False negatives]). However, the idea behind this model is to identify customers that are more likely to accept our offer and thus we need not only to define ‘yes’ as the positive class but also use a sampling method to train our model due to the high class imbalance.
target_dist<-ggplot(raw_train_data, aes(y, fill=y))+
geom_bar() + scale_fill_manual(values=c("#995052", "#529950")) +
xlab("Campaign offer accepted")+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
ggtitle("Distribution of target variable") + theme_minimal() +
theme(text = element_text(face = "bold"), legend.position = "none",
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
target_dist
The next step is to look at the distribution of our features. This helps us to identify if some variables are skewed or unbalanced. We will start in this case with continous variables and look at the quintiles (including min and max), the mean and the standard deviation. This helps us especially when looking at the minimum to identify special cases such as pdays -1 compared to only looking at the histograms.
In the case of pdays we see we have more than 75% of the rows that contain a -1 which means the customer was never contacted before. This is also reflected in the feature previous that shows a 0 for most customers, counting the contacts performed before this campaign.
cont_var <- as.data.frame(raw_train_data)
cont_var <- cont_var[ , (names(cont_var) %in% continuous_var)]
distribution <- as.data.frame(t(sapply(cont_var, quantile)))
distribution$Mean <- sapply(cont_var, mean)
distribution$SD <- sapply(cont_var, sd)
datatable(round(distribution, 2))
Looking at the density plots of out continous variables we can see that all of them are highly skewed to the right. The exception here is the day which reflects the day within the month. However, we see a variation across days and not a uniform distribution meaning we have days within the month were more customer are called compared to others.
cont_var_melt <- as.data.frame(melt(cont_var))
cont_dist <- ggplot(cont_var_melt, aes(value)) +
geom_density(aes(fill = variable)) +
facet_wrap(~variable, scales = "free", nrow = 3) +
labs(x = "", y = "", fill = "") + theme_minimal() +
scale_fill_tableau() + ggtitle("Distribution of each continous variable") +
theme(text = element_text(face = "bold"), legend.position = "none",
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
cont_dist
The next
## Create a new column in the test dataset
raw_test_data$y <- NA
## Creating a column "dataType" for both train and test datasets and assign the value 'train' & 'test'
raw_train_data$dataType <- "train"
raw_test_data$dataType <- "test"
## Merging both train and test datasets
dataset <- rbind(raw_train_data, raw_test_data)
ggplot(dataset, aes(x=age, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Age Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
ggplot(dataset, aes(x=balance, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Balance Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
ggplot(dataset, aes(x=day, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Day Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
ggplot(dataset, aes(x=duration, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Duration Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
ggplot(dataset, aes(x=campaign, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Campaign Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
ggplot(dataset, aes(x=pdays, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("pdays Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
ggplot(dataset, aes(x=previous, color = dataType)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Previous Distribution") + theme_classic() +
scale_color_manual(values=c("#e08926", "#3526e0"))
df_disc <- raw_train_data[, ..discrete_var]
df_disc <- sapply(df_disc, as.factor)
df_disc <- as.data.frame(melt(df_disc))
disc_dist <- ggplot(df_disc, aes(value)) +
geom_bar(aes(fill = Var2)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 5)) +
scale_x_discrete(expand = c(0,0)) +
facet_wrap(~Var2, scales = "free", nrow = 2) +
scale_fill_tableau() +
ggtitle("Count of each discrete variable") +
labs(fill = "", x = "", y = "") +
theme_minimal() +
theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
disc_dist
job_train <- ggplot(raw_train_data, aes(x=job)) +
geom_bar(fill ="#3526e0") +
ggtitle("Job Distribution - Train") + labs(y = "", x="") + theme_minimal() +
theme(text = element_text(face = "plain"), legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
job_test <- ggplot(raw_test_data, aes(x=job)) +
geom_bar(fill = "#e08926") + ggtitle("Job Distribution - Test") + labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(job_train, job_test, ncol=2)
marital_train <- ggplot(raw_train_data, aes(x=marital)) +
geom_bar(fill ="#3526e0") + ggtitle("Marital Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
marital_test <- ggplot(raw_test_data, aes(x=marital)) +
geom_bar(fill = "#e08926") + ggtitle("Marital Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(marital_train, marital_test, ncol=2)
education_train <- ggplot(raw_train_data, aes(x=education)) +
geom_bar(fill ="#3526e0") + ggtitle("Education Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
education_test <- ggplot(raw_test_data, aes(x=education)) +
geom_bar(fill = "#e08926") + ggtitle("Education Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(education_train, education_test, ncol=2)
default_train <- ggplot(raw_train_data, aes(x=default)) +
geom_bar(fill ="#3526e0") + ggtitle("Default Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
default_test <- ggplot(raw_test_data, aes(x=default)) +
geom_bar(fill = "#e08926") + ggtitle("Default Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(default_train, default_test, ncol=2)
housing_train <- ggplot(raw_train_data, aes(x=housing)) +
geom_bar(fill ="#3526e0") + ggtitle("Housing Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
housing_test <- ggplot(raw_test_data, aes(x=housing)) +
geom_bar(fill = "#e08926") + ggtitle("Housing Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(housing_train, housing_test, ncol=2)
loan_train <- ggplot(raw_train_data, aes(x=loan)) +
geom_bar(fill ="#3526e0") + ggtitle("Loan Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
loan_test <- ggplot(raw_test_data, aes(x=loan)) +
geom_bar(fill = "#e08926") + ggtitle("Loan Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(loan_train, loan_test, ncol=2)
contact_train <- ggplot(raw_train_data, aes(x=contact)) +
geom_bar(fill ="#3526e0") + ggtitle("Contact Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
contact_test <- ggplot(raw_test_data, aes(x=contact)) +
geom_bar(fill = "#e08926") + ggtitle("Contact Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(contact_train, contact_test, ncol=2)
month_train <- ggplot(raw_train_data, aes(x=month)) +
geom_bar(fill ="#3526e0") + ggtitle("Month Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
month_test <- ggplot(raw_test_data, aes(x=month)) +
geom_bar(fill = "#e08926") + ggtitle("Month Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(month_train, month_test, ncol=2)
poutcome_train <- ggplot(raw_train_data, aes(x=poutcome)) +
geom_bar(fill ="#3526e0") + ggtitle("poutcome Distribution - Train") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
poutcome_test <- ggplot(raw_test_data, aes(x=poutcome)) +
geom_bar(fill = "#e08926") + ggtitle("poutcome Distribution - Test") +
labs(y = "", x="") + theme_minimal() + theme(text = element_text(face = "plain"),
legend.position = "none",
axis.text.x = element_text(size = 7, angle = 90),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5))
grid.arrange(poutcome_train, poutcome_test, ncol=2)
correlation <- cor(cont_var)
p <- plot_ly(
x = continuous_var,
y=continuous_var,
z = correlation,
type = "heatmap",
colorscale=list(c(0, "rgb(0,0,255)"),
list(0.1, "rgb(51,153,255)"),
list(0.2, "rgb(102,204,255)"),
list(0.3, "rgb(153,204,255)"),
list(0.4, "rgb(204,204,255)"),
list(0.5, "rgb(255,255,255)"),
list(0.6, "rgb(255,204,255)"),
list(0.7, "rgb(255,153,255)"),
list(0.8, "rgb(255,102,204)"),
list(0.9, "rgb(255,102,102)"),
list(1, "rgb(255,0,0)")),
title = "Correlation Matrix"
)
p <- layout(p, title="Continuous Variable Correlations")
p
## Warning: 'heatmap' objects don't have these attributes: 'title'
## Valid attributes include:
## 'type', 'visible', 'showlegend', 'legendgroup', 'opacity', 'name', 'uid', 'ids', 'customdata', 'selectedpoints', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'z', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'text', 'transpose', 'xtype', 'ytype', 'zsmooth', 'connectgaps', 'xgap', 'ygap', 'zhoverformat', 'zauto', 'zmin', 'zmax', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'hoverinfosrc', 'zsrc', 'xsrc', 'ysrc', 'textsrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
cont_box <- ggplot(cont_var_melt, aes(variable, value)) +
geom_boxplot(aes(fill = variable)) +
coord_flip() +
scale_fill_tableau() +
labs(x = "", y = "") +
theme_minimal() +
theme(text = element_text(face = "bold"),
legend.position = "none",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5),
axis.text.x = element_blank())
cont_box
df_cont_norm <- raw_train_data[,..continuous_var]
df_cont_norm <- as.data.frame(apply(df_cont_norm, 2,function(x)((x - min(x))/(max(x)-min(x)))))
df_cont_norm <- as.data.frame(melt(df_cont_norm))
cont_box_norm <- ggplot(df_cont_norm, aes(variable, value)) +
geom_boxplot(aes(fill = variable)) +
coord_flip() +
scale_fill_tableau() +
labs(x = "", y = "") +
theme_minimal() +
theme(text = element_text(face = "bold"),
legend.position = "none",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5),
axis.text.x = element_blank())
cont_box_norm
disc_box <- ggplot(df_disc, aes(Var2, as.numeric(value))) +
geom_boxplot(aes(fill = Var2)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_discrete(expand = c(0,0)) +
facet_wrap(~Var2, scales = "free", ncol = 1) +
scale_fill_tableau() +
ggtitle("Distribution of each discrete variable") +
labs(fill = "", x = "", y = "") +
coord_flip() +
theme_light() +
theme(text = element_text(face = "bold"),
legend.position = "none",
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5),
strip.background = element_blank(),
strip.text.x = element_blank())
disc_box
ggplot(raw_train_data, aes(x=age, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Age Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data, aes(x=balance, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Balance Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data, aes(x=day, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Day Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data, aes(x=duration, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Duration Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data, aes(x=campaign, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Campaign Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data, aes(x=pdays, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("pdays Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data, aes(x=previous, color = y)) +
geom_density(alpha = 0.7, size=1.5) +
ggtitle("Previous Target Distribution") + theme_classic() +
labs(color = "Offer accepted") +
scale_color_manual(values=c("#995052", "#529950"))
ggplot(raw_train_data,mapping = aes(job,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(marital,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(education,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(default,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(housing,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(loan,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(contact,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(month,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))
ggplot(raw_train_data,mapping = aes(poutcome,fill=y))+
geom_bar(col="black", position = position_dodge(1))+
geom_text(stat="count",aes(label=scales::percent((..count..)/sum(..count..))),
position=position_dodge(1), size=3, vjust = -.5)+
theme_classic() +
scale_fill_manual("Offer Accepted", values = c("no" = "#995052", "yes" = "#529950"))